Executive summary
This analysis of lego dataset refers to data
downloaded from the course website on 07.12.2023.
The analysis focuses on the lego data set. It is divided into a
couple of chapters:
- Libraries - presents the libraries used to
prepare the report,
- Data loading - presents code to load
datasets,
- Data intoduction - presents the
datasets used for analysis, including their structure, dimensions, and
basic statistics,
- Detailed analysis - presents
detailed analysis of attribute values,
- Correlation - presents the correlations
between variables,
- Trends - presents data trends over the
years,
- Forcasting - presents predictions of the
number of sets in the future.
Conclusions:
- Lego parts colors
- Lego elements
- The most popular element
colors are black, white, red and yellow. There is a high similarity
between the most popular colors of parts and elements, because parts are
composed of elements.
- Minifigs
- Most popular number of
parts used to build minifigs is 4. Minifigures are more likely built
from a small number of parts.
- The most popular minifigs are
“Skeleton”, “Battle Droid” and “Classic Spaceman”. The top 10 most
popular minifigures include figures from the Star Wars movie
and the Minecraft game. This means that collaborations are
important in the developing of new minifigures.
- Themes
- The most popular themes are “Gear”,
“Duplo” and “Educational and Data”. Collaborations such as “Star Wars”
are also very popular.
- Parts
- Sets
- Sets with the most parts are
“World Map”, “Eiffel Tower” and “The Ultimate Battle for Chima”. The
most parts are used in the sets that are not intended for playing, but
for building.
- Correlation between total
number of sets and year equals 0.879. The rapid
growth over the past 20 years leads us to conclude that even more sets
will be produced annually in the future.
- In a chart showing the number of sets over time for the
most popular themes, some trends can be seen. Over the past few
years, the company has been releasing more and more “Books” theme sets.
The number of “Gear” sets has also been growing. The number of Star
Wars-themed sets has declined slightly in recent years following the
release of the last 9 parts of the film in 2019.
- Trends in sets over the years show that
the Lego company is growing. More parts are used each year, and the
average number of parts per set is increasing. The number of themes is
also growing, providing a wider choice of subjects. The number of large
sets with more than 1,000 parts and the maximum number of parts per set
has increased dramatically over the past few years. Only the median
remains the same over the years. This is due to the small share of large
collections, which represent outliers compared to all sets
produced.
Libraries
Libraries used to prepare the report.
library(knitr)
library(dplyr)
library(R.utils)
library(data.table)
library(tools)
library(stringr)
library(ggplot2)
library(plotly)
library(tidyr)
library(scales)
library(forecast)
Data loading
Code to load datasets from compressed files stored in a specified
folder.
folder_name = "data"
csv_files <- list.files("data",
pattern = "\\.csv.gz$",
full.names = FALSE)
files_names <- file_path_sans_ext(csv_files, compression = TRUE)
for (file_name in files_names) {
assign(paste0(file_name, "_df"),
fread(file.path(
folder_name,
paste0(file_name, ".csv.gz")
)))
}
csv_files
## [1] "colors.csv.gz" "elements.csv.gz"
## [3] "inventories.csv.gz" "inventory_minifigs.csv.gz"
## [5] "inventory_parts.csv.gz" "inventory_sets.csv.gz"
## [7] "minifigs.csv.gz" "part_categories.csv.gz"
## [9] "part_relationships.csv.gz" "parts.csv.gz"
## [11] "sets.csv.gz" "themes.csv.gz"
Data intoduction
The section below presents the datasets used for analysis, including
their structure, dimensions, and basic statistics.
Total dataset size
| 1446639 |
45 |
8099232 |
0.29% |
Data structure

Datasets
summaries
Colors
Colors dataset dimensions
| 263 |
4 |
Colors dataset basic statistics
| Min. : -1.0 |
Length:263 |
Length:263 |
Length:263 |
| 1st Qu.: 83.0 |
Class :character |
Class :character |
Class :character |
| Median :1005.0 |
Mode :character |
Mode :character |
Mode :character |
| Mean : 651.4 |
|
|
|
| 3rd Qu.:1070.5 |
|
|
|
| Max. :9999.0 |
|
|
|
Head of Colors dataset
| -1 |
[Unknown] |
0033B2 |
f |
| 0 |
Black |
05131D |
f |
| 1 |
Blue |
0055BF |
f |
| 2 |
Green |
237841 |
f |
| 3 |
Dark Turquoise |
008F9B |
f |
| 4 |
Red |
C91A09 |
f |
Elements
Elements dataset dimensions
| 84138 |
4 |
Elements dataset basic statistics
| Min. : 9327 |
Length:84138 |
Min. : -1.0 |
Min. : 1001 |
| 1st Qu.: 4259774 |
Class :character |
1st Qu.: 8.0 |
1st Qu.: 18454 |
| Median : 6057754 |
Mode :character |
Median : 28.0 |
Median : 41748 |
| Mean : 5222065 |
|
Mean : 539.7 |
Mean : 45570 |
| 3rd Qu.: 6262024 |
|
3rd Qu.: 135.0 |
3rd Qu.: 75474 |
| Max. :61532443 |
|
Max. :9999.0 |
Max. :107520 |
|
|
|
NA’s :23682 |
Head of Elements dataset
| 6443403 |
2277c01pr0009 |
1 |
2277 |
| 6300211 |
67906c01 |
14 |
67908 |
| 4566309 |
2564 |
0 |
2564 |
| 4275423 |
53657 |
1004 |
53657 |
| 6194308 |
92926 |
71 |
28967 |
| 6229123 |
26561 |
4 |
26561 |
Inventories
Inventories dataset dimensions
| 37265 |
3 |
Inventories dataset basic statistics
| Min. : 1 |
Min. : 1.000 |
Length:37265 |
| 1st Qu.: 14424 |
1st Qu.: 1.000 |
Class :character |
| Median : 54379 |
Median : 1.000 |
Mode :character |
| Mean : 61104 |
Mean : 1.091 |
|
| 3rd Qu.: 88842 |
3rd Qu.: 1.000 |
|
| Max. :194312 |
Max. :16.000 |
|
Head of Inventories dataset
| 1 |
1 |
7922-1 |
| 3 |
1 |
3931-1 |
| 4 |
1 |
6942-1 |
| 15 |
1 |
5158-1 |
| 16 |
1 |
903-1 |
| 17 |
1 |
850950-1 |
Inventory
minifigs
Inventory minifigs dataset dimensions
| 20858 |
3 |
Inventory minifigs dataset basic statistics
| Min. : 3 |
Length:20858 |
Min. : 1.000 |
| 1st Qu.: 7869 |
Class :character |
1st Qu.: 1.000 |
| Median : 15681 |
Mode :character |
Median : 1.000 |
| Mean : 43010 |
|
Mean : 1.062 |
| 3rd Qu.: 66834 |
|
3rd Qu.: 1.000 |
| Max. :194312 |
|
Max. :100.000 |
Head of Inventory minifigs dataset
| 3 |
fig-001549 |
1 |
| 4 |
fig-000764 |
1 |
| 19 |
fig-000555 |
1 |
| 25 |
fig-000574 |
1 |
| 26 |
fig-000842 |
1 |
| 26 |
fig-008641 |
1 |
Inventory
parts
Inventory parts dataset dimensions
| 1180987 |
6 |
Inventory parts dataset basic statistics
| Min. : 1 |
Length:1180987 |
Min. : -1.0 |
Min. : 1.00 |
Length:1180987 |
Length:1180987 |
| 1st Qu.: 9404 |
Class :character |
1st Qu.: 4.0 |
1st Qu.: 1.00 |
Class :character |
Class :character |
| Median : 22838 |
Mode :character |
Median : 15.0 |
Median : 2.00 |
Mode :character |
Mode :character |
| Mean : 50849 |
|
Mean : 131.8 |
Mean : 3.37 |
|
|
| 3rd Qu.: 87088 |
|
3rd Qu.: 71.0 |
3rd Qu.: 4.00 |
|
|
| Max. :194312 |
|
Max. :9999.0 |
Max. :3064.00 |
|
|
Inventory sets
Inventory sets dataset dimensions
| 4358 |
3 |
Inventory sets dataset basic statistics
| Min. : 35 |
Length:4358 |
Min. : 1.000 |
| 1st Qu.: 8076 |
Class :character |
1st Qu.: 1.000 |
| Median : 16423 |
Mode :character |
Median : 1.000 |
| Mean : 52519 |
|
Mean : 1.813 |
| 3rd Qu.: 98685 |
|
3rd Qu.: 1.000 |
| Max. :191576 |
|
Max. :60.000 |
Head of Inventory sets dataset
| 35 |
75911-1 |
1 |
| 35 |
75912-1 |
1 |
| 39 |
75048-1 |
1 |
| 39 |
75053-1 |
1 |
| 50 |
4515-1 |
1 |
| 50 |
4520-1 |
2 |
Minifigs
Minifigs dataset dimensions
| 13764 |
4 |
Minifigs dataset basic statistics
| Length:13764 |
Length:13764 |
Min. : 0.000 |
Length:13764 |
| Class :character |
Class :character |
1st Qu.: 4.000 |
Class :character |
| Mode :character |
Mode :character |
Median : 4.000 |
Mode :character |
|
|
Mean : 5.296 |
|
|
|
3rd Qu.: 5.000 |
|
|
|
Max. :156.000 |
|
Part
categories
Part categories dataset dimensions
| 66 |
2 |
Part categories dataset basic statistics
| Min. : 1.00 |
Length:66 |
| 1st Qu.:19.25 |
Class :character |
| Median :35.50 |
Mode :character |
| Mean :35.36 |
|
| 3rd Qu.:51.75 |
|
| Max. :68.00 |
|
Head of Part categories dataset
| 1 |
Baseplates |
| 3 |
Bricks Sloped |
| 4 |
Duplo, Quatro and Primo |
| 5 |
Bricks Special |
| 6 |
Bricks Wedged |
| 7 |
Containers |
Part
relationships
Part relationships dataset dimensions
| 29977 |
3 |
Part relationships dataset basic statistics
| Length:29977 |
Length:29977 |
Length:29977 |
| Class :character |
Class :character |
Class :character |
| Mode :character |
Mode :character |
Mode :character |
Head of Part relationships dataset
| P |
3626cpr3662 |
3626c |
| P |
87079pr9974 |
87079 |
| P |
3960pr9971 |
3960 |
| R |
98653pr0003 |
98086pr0003 |
| R |
98653pr0003 |
98088pat0003 |
| R |
98653pr0003 |
98089pat0003 |
Parts
Parts dataset dimensions
| 52615 |
4 |
Parts dataset basic statistics
| Length:52615 |
Length:52615 |
Min. : 1.00 |
Length:52615 |
| Class :character |
Class :character |
1st Qu.:17.00 |
Class :character |
| Mode :character |
Mode :character |
Median :41.00 |
Mode :character |
|
|
Mean :38.91 |
|
|
|
3rd Qu.:60.00 |
|
|
|
Max. :68.00 |
|
Head of Parts dataset
| 003381 |
Sticker Sheet for Set 663-1 |
58 |
Plastic |
| 003383 |
Sticker Sheet for Sets 618-1, 628-2 |
58 |
Plastic |
| 003402 |
Sticker Sheet for Sets 310-3, 311-1, 312-3 |
58 |
Plastic |
| 003429 |
Sticker Sheet for Set 1550-1 |
58 |
Plastic |
| 003432 |
Sticker Sheet for Sets 357-1, 355-1, 940-1 |
58 |
Plastic |
| 003434 |
Sticker Sheet for Set 575-2, 653-1, 460-1 |
58 |
Plastic |
Sets
Sets dataset dimensions
| 21880 |
6 |
Sets dataset basic statistics
| Length:21880 |
Length:21880 |
Min. :1949 |
Min. : 1 |
Min. : 0.0 |
Length:21880 |
| Class :character |
Class :character |
1st Qu.:2001 |
1st Qu.:273 |
1st Qu.: 3.0 |
Class :character |
| Mode :character |
Mode :character |
Median :2012 |
Median :497 |
Median : 31.0 |
Mode :character |
|
|
Mean :2008 |
Mean :442 |
Mean : 161.4 |
|
|
|
3rd Qu.:2018 |
3rd Qu.:608 |
3rd Qu.: 139.0 |
|
|
|
Max. :2024 |
Max. :752 |
Max. :11695.0 |
|
Themes
Themes dataset dimensions
| 468 |
3 |
Themes dataset basic statistics
| Min. : 1.0 |
Length:468 |
Min. : 1.0 |
| 1st Qu.:250.5 |
Class :character |
1st Qu.:186.0 |
| Median :466.0 |
Mode :character |
Median :411.0 |
| Mean :433.5 |
|
Mean :360.6 |
| 3rd Qu.:625.2 |
|
3rd Qu.:512.5 |
| Max. :752.0 |
|
Max. :697.0 |
|
|
NA’s :145 |
Head of Themes dataset
| 1 |
Technic |
|
| 3 |
Competition |
1 |
| 4 |
Expert Builder |
1 |
| 16 |
RoboRiders |
1 |
| 17 |
Speed Slammers |
1 |
| 18 |
Star Wars |
1 |
Detailed analysis
Colors
Most popular colors
of parts
Distribution of
colors by transparency

Elements
Most popular
elements colors
Minifigs
Most popular number
of parts used to build minifigs
Most popular
minifigs
Most popular minifigs
| Skeleton, Standard Face, Ball Joint Arms (3626b
Head) |
 |
43 |
| Battle Droid, One Bent Arm, One Straight Arm |
 |
40 |
| Classic Spaceman, Red with Airtanks (3842a
Helmet) |
 |
39 |
| Classic Spaceman, White with Airtanks (3842a
Helmet) |
 |
37 |
| Steve |
 |
27 |
| Policeman, Black Suit with Pocket and Badge, White
Hat (3626a Head) |
 |
24 |
| Battle Droid, Two Bent Arms |
 |
21 |
| Classic Spaceman, Yellow with Airtanks (3842b
Helmet) |
 |
21 |
| Johnny Thunder (Desert) |
 |
21 |
| Chewbacca, Reddish Brown |
 |
20 |
Parts
Most populat parts
material
Most popular parts
categories